20 research outputs found
Characterizing the Temperature of SAT Formulas
The remarkable advances in SAT solving achieved in the last years have allowed to use this technology to solve many real-world applications, such as planning, formal verification and cryptography, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features of these application problems to better understand the success of those SAT solving techniques on them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the popularity–similarity random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. This model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Using our regression model, we observe that the estimated temperature of the applications benchmarks used in the last SAT Competitions correlates to their hardness in most of the cases.Juan de la Cierva program, fellowship IJC2019-040489-I, funded by MCIN and AE
Improving Skip-Gram based Graph Embeddings via Centrality-Weighted Sampling
Network embedding techniques inspired by word2vec represent an effective
unsupervised relational learning model. Commonly, by means of a Skip-Gram
procedure, these techniques learn low dimensional vector representations of the
nodes in a graph by sampling node-context examples. Although many ways of
sampling the context of a node have been proposed, the effects of the way a
node is chosen have not been analyzed in depth. To fill this gap, we have
re-implemented the main four word2vec inspired graph embedding techniques under
the same framework and analyzed how different sampling distributions affects
embeddings performance when tested in node classification problems. We present
a set of experiments on different well known real data sets that show how the
use of popular centrality distributions in sampling leads to improvements,
obtaining speeds of up to 2 times in learning times and increasing accuracy in
all cases
Knowledge discovery in multi-relational graphs
Ante el reducido abanico de metodologías para llevar a cabo tareas de aprendizaje automático relacional, el objetivo principal de esta tesis es realizar un análisis de los métodos existentes, modificando u optimizando en la medida de lo posible algunos de ellos, y aportar nuevos métodos que proporcionen nuevas vías para abordar esta difícil tarea. Para ello, y sin nombrar objetivos relacionados con revisiones bibliográficas ni comparativas entre modelos e implementaciones, se plantean una serie de objetivos concretos a ser cubiertos:
1. Definir estructuras flexibles y potentes que permitan modelar fenómenos en base a los elementos que los componen y a las relaciones establecidas entre éstos. Dichas estructuras deben poder expresar de manera natural propiedades complejas (valores continuos o categóricos, vectores, matrices, diccionarios, grafos,...) de los elementos, así como relaciones heterogéneas entre éstos que a su vez puedan poseer el mismo nivel de propiedades complejas. Además, dichas estructuras deben permitir modelar fenómenos en los que las relaciones entre los elementos no siempre se dan de forma binaria (intervienen únicamente dos elementos), sino que puedan intervenir un número cualquiera de ellos.
2. Definir herramientas para construir, manipular y medir dichas estructuras. Por muy potente y flexible que sea una estructura, será de poca utilidad si no se poseen las herramientas adecuadas para manipularla y estudiarla. Estas herramientas deben ser eficientes en su implementación y cubrir labores de construcción y consulta.
3. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja negra. En aquellas tareas en las que nuestro objetivo no es obtener modelos explicativos, podremos permitirnos utilizar modelos de caja negra, sacrificando la interpretabilidad a favor de una mayor eficiencia computacional.
4. Desarrollar nuevos algoritmos de aprendizaje automático relacional de caja blanca.
Cuando estamos interesados en una explicación acerca del funcionamiento de los sistemas que se analizan, buscaremos modelos de aprendizaje automático de caja blanca.
5. Mejorar las herramientas de consulta, análisis y reparación para bases de datos. Algunas de las consultas a larga distancia en bases de datos presentan un coste computacional demasiado alto, lo que impide realizar análisis adecuados en algunos sistemas de información. Además, las bases de datos en grafo carecen de métodos que permitan normalizar o reparar los datos de manera automática o bajo la supervisión de un humano. Es interesante aproximarse al desarrollo de herramientas que lleven a cabo este tipo de tareas aumentando la eficiencia y ofreciendo una nueva capa de
consulta y normalización que permita curar los datos para un almacenamiento y una recuperación más óptimos.
Todos los objetivos marcados son desarrollados sobre una base formal sólida, basada en Teoría de la Información, Teoría del Aprendizaje, Teoría de Redes Neuronales Artificiales y Teoría de Grafos. Esta base permite que los resultados obtenidos sean suficientemente formales como para que los aportes que se realicen puedan ser fácilmente evaluados. Además,
los modelos abstractos desarrollados son fácilmente implementables sobre máquinas reales para poder verificar experimentalmente su funcionamiento y poder ofrecer a la comunidad científica soluciones útiles en un corto espacio de tiempo
Detecting the ultra low dimensionality of real networks
Reducing dimension redundancy to find simplifying patterns in high dimensional datasets and complex networks has become a major endeavor
in many scientific fields. However, detecting the dimensionality of their latent
space is challenging but necessary to generate efficient embeddings to be used
in a multitude of downstream tasks. Here, we propose a method to infer the
dimensionality of networks without the need for any a priori spatial embed ding. Due to the ability of hyperbolic geometry to capture the complex con nectivity of real networks, we detect ultra low dimensionality far below values
reported using other approaches. We applied our method to real networks
from different domains and found unexpected regularities, including: tissue specific biomolecular networks being extremely low dimensional; brain con nectomes being close to the three dimensions of their anatomical embedding;
and social networks and the Internet requiring slightly higher dimensionality.
Beyond paving the way towards an ultra efficient dimensional reduction, our
findings help address fundamental issues that hinge on dimensionality, such as
universality in critical behavior.Agencia Estatal de Investigación PID2019-106290GB-C22/AEI/10.13039/501100011033Generalitat de Catalunya 2017SGR106
Generador de Grafos Multi-relacionales a partir de redes sociales
The tool introduced in this paper, CorpuRed, allows obtaining a dataset from online social networks that can be used for research projects that require information about social behaviour on Internet. The way to obtain such data is slightly platform dependent (the Facebook case is described) and they are stored in a graph database that will be accessible through an academic license API.La herramienta presentada en este artículo, CorpuRed, permite obtener datos de plataformas sociales en línea para ser utilizados en proyectos de investigación que requieran de información sobre el comportamiento social en Internet. La forma de obtener dichos datos depende ligeramente de cada plataforma (se muestra el caso particular de Facebook), y posteriormente son almacenados en una base de datos en grafo que será accesible a través de una API bajo una licencia académica.
On the Temperature of SAT Formulas
The remarkable advances in SAT solving achieved in the last years have allowed to use this technology in many real-world applications of Artificial Intelligence, such as planning, formal verification, and scheduling, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features shared by the majority of these application problems. The study of these models may help to better understand the success of those SAT solving techniques and possibly improve them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the Popularity-Similarity (PS) random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. The PS model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Our solution is a first step towards a hardness oracle based on the temperature of SAT formulas, which may be able to estimate the cost of solving real-world SAT instances without solving them
Generador de Grafos Multi-relacionales a partir de redes sociales
La herramienta presentada en este artículo, CorpuRed, permite obtener datos de plataformas sociales en línea para ser utilizados en proyectos de investigación que requieran de información sobre el comportamiento social en Internet. La forma de obtener dichos datos depende ligeramente de cada plataforma (se muestra el caso particular de Facebook), y posteriormente son almacenados en una base de datos en grafo que será accesible a través de una API bajo una licencia académica.
A Multi-Relational Graph Generator Based-on Social Networks Data
The tool introduced in this paper, CorpuRed, allows obtaining a dataset
from online social networks that can be used for research projects
that require information about social behaviour on Internet. The way
to obtain such data is slightly platform dependent (the Facebook case is
described) and they are stored in a graph database that will be accessible
through an academic license API